Purpose

This is a follow-along document reporting my engagement with Coding Club’s Intro to spatial analysis in R tutorial. The tutorial seems severely out of date. Some parts are represented here while others were updated using an LLM tool to update all the content so that I still get to know common practices in spatial analysis with R while also learning the newest best practice with R as a whole.

Explore raster data

A raster is a grid of equal size cells, or pixels in satellite images, and it is commonly used to represent spatially continuous data. The cells can have one or more values, or even no values for the variable of interest. In the trimmed multispectral image we will be using, each cell contains relfectance data for 12 spectral bands.

# Load data
tay <- rast('taycrop.tif')

print(tay)
## class       : SpatRaster 
## dimensions  : 507, 848, 12  (nrow, ncol, nlyr)
## resolution  : 9.217891e-05, 9.217891e-05  (x, y)
## extent      : -4.320218, -4.242051, 56.45366, 56.50039  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : taycrop.tif 
## names       : taycrop_1, taycrop_2, taycrop_3, taycrop_4, taycrop_5, taycrop_6, ...

The output will include the number of bands, dimensions, resolution, extent, and CRS (Coordinate Reference System). Notice that—even though the original file may contain 12 bands—terra stores all bands in one SpatRaster object.

We can create individual raster layers for each of the spectral bands in the raster tay.

To work on individual spectral bands, you can extract them directly from the multi-band object. With terra you use the double-bracket operator [[ ]]:

b1 <- tay[[1]]
b2 <- tay[[2]]
b3 <- tay[[3]]
b4 <- tay[[4]]
b5 <- tay[[5]]
b6 <- tay[[6]]
b7 <- tay[[7]]
b8 <- tay[[8]]
b9 <- tay[[9]]
b10 <- tay[[10]]
b11 <- tay[[11]]
b12 <- tay[[12]]

We can now compare two bands to see if they have the same extent, number of rows and column, projection, resolution and origin. As can be seen below, bands 2 and 3 match.

If you wish to compare geometries (extent, resolution, CRS) of two bands, use the compareGeom() function. Should return TRUE if they are identical

compareGeom(b2, b3)
## [1] TRUE

Checking the coordinate systems and extents of rasters is a very useful skill - quite often when you have problems with working with multiple raster objects, it’s because of differences in coordinate systems or extents.

The default plot() function in terra displays a subsample (about 100,000 pixels) to keep plots fast, while image() will render a stretched full-resolution view.

plot(b8)   # Quick plot with a pixel limit

image(b8)   # Full-resolution plot

Visualise spectral bands

You can save a plot of a specific band (e.g., b8) using the png() function. Note that the viridis package now provides its palette via the function viridis::viridis(), which is preferred over older functions.

# Save a plot of band 8 with a viridis palette
png('01tayplot.png', width = 4, height = 4, units = "in", res = 300)
image(b8, col = viridis::viridis(10, option = "D"), 
      main = "Sentinel 2 image of Loch Tay")
dev.off()   # Closes the device, ensuring no conflicts with subsequent plots
## png 
##   2

To view the plot interactively (without saving), simply run:

image(
  b8, 
  col = viridis::viridis(10, option = "D"),
  main = "Sentinel 2 image of Loch Tay"
  )

A useful way to visualise the satellite data is to plot a red-green-blue plot of a multi-layered object for a more realistic rendition. The layers or bands represent different bandwidth in the visible electromagnetic spectrum (corresponding to red, blue and green) and combined, create a naturalistic colour rendition of the earth surface.

A false-color composite (FCC) can help accentuate vegetation. For Sentinel-2, a common FCC is to replace the red band with the near infrared band (b8), use red (b4) for green, and green (b3) for blue:

# Save a false-color composite image
png('02FCC.png', width = 5, height = 4, units = "in", res = 300)
tayFCC <- c(b8, b4, b3)  # FCC: NIR, Red, Green
plotRGB(tayFCC, 
        axes = TRUE, 
        stretch = "lin", 
        main = "Sentinel False Color Composite (FCC)"
        )
dev.off()
## png 
##   2

For more flexible visualizations, you can convert a single band (e.g., b8) into a data frame and plot it using ggplot2:

library(ggplot2)

# Convert band 8 to a data frame with x, y coordinates and pixel values. This function is part of terra.
df_b8 <- as.data.frame(b8, xy = TRUE)

# Create the plot using ggplot2
p <- ggplot(df_b8, aes(x = x, y = y, fill = taycrop_8)) +
  geom_raster() +
  scale_fill_viridis_c() +
  coord_quickmap() +
  labs(
    title = "West of Loch Tay, Raster Plot",
    x = "Longitude",
    y = "Latitude"
  )+
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5),
    text = element_text(size = 20),
    axis.text.x = element_text(angle = 90, hjust = 1)
  )
print(p)

ggsave("03ggtay.png", p, scale = 1.5, dpi = 300)
## Saving 10.5 x 7.5 in image

Note that here we saved the plot in a slightly different way - for plots creates using ggplot2, we can use the ggsave function and we define the specifics of the saved plot after we’ve created it, whereas earlier in the tutorial when we were using the png() function in combination with dev.off(), the plot characteristics are defined before we make the plot inside the png() function.

To visualise all the bands together, we can use facet_wrap in gplot. First, we will create a stack of all the bands, so just putting them all on top of each other, like layers in a cake.

# Stack all bands using terra's c() function
t <- c(b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12)

# Convert the multi-layer raster to a data frame including an identifier for each band
t_df <- as.data.frame(t, xy = TRUE)
t_df_melt <- melt(t_df, id.vars = c("x", "y"))

# Create faceted plots for each band
p_all <- ggplot(t_df_melt, aes(x = x, y = y, fill = value)) +
  geom_raster() +
  scale_fill_viridis_c() +
  facet_wrap(~variable) +
  coord_quickmap() +
  labs(title = "Sentinel 2 Loch Tay, Raster Plots",
       x = "Longitude", y = "Latitude") +
  theme_classic() +
  theme(text = element_text(size = 20),
        axis.text.x = element_text(angle = 90, hjust = 1),
        plot.title = element_text(hjust = 0.5))
print(p_all)

ggsave("04allbands.png", p_all, scale = 1.5, dpi = 300)
## Saving 10.5 x 7.5 in image

Terra is designed as a modern replacement for the raster package, and it provides its own, very efficient plotting functions for SpatRaster objects. For example, if you have a multi-band raster file, you can quickly visualize it using terra’s built‐in plot methods without converting it into a brick first. Here’s how you can do it:

# Quickly plot all bands
plot(tay)

# For an RGB composite, you can use plotRGB
# (adjust the band numbers as appropriate for your data)

# Use the stretch argument so that terra rescales your band values for on‐the‐fly display. For example:
plotRGB(tay, r = 4, g = 3, b = 2, stretch = "lin")

# Or
plotRGB(tay, r = 4, g = 3, b = 2, stretch = "hist")

Notice the difference in colour and range of legend between the different bands. Different earth surfaces reflect the solar radiation differently and each raster layer represents how much incident solar radiation is reflected at a particular wavelength bandwidth. Bands 6 to 9 are in the Near Infrared Range (NIR). Vegetation reflects more NIR than other wavelengths but water absorbs NIR, therefore the lighter areas with high reflectance values are likely to be vegetation and the dark blue, low reflectance value areas, likely to be water. Also note that the Sentinel 2 bands have 3 levels of spatial resolution, 10 m, 20 m, and 60 m (see summary below).

10 m resolution band 2, band 3, band 4 and band 8

20 m resolution band 5, band 6, band 7, band 11 and band 12

60 m resolution band 1, band 9 and band 10

Manipulate rasters: NDVI and KMN classification

The Normalised Difference Vegetation Index (NDVI) is a widely used vegetation index that quantifies vegetation presence, health or structure. It is calculated using the Near Infrared (NIR) and Red bandwith of the spectrum. Healthy vegetation reflects light strongly in the NIR part of the spectrum and absorbs light in red part of the visible spectrum for photosynthesis. A high ratio between light refected in the NIR part of the spectrum and light reflected in the red part of the spectrum would represent areas that potentially have healthy vegetation. It is worth noting that different plant species absorb light in the red part of the spectrum at different rates. The same plant will also absorb light in the red band differently depending on whether it is stressed or healthy, or the time of year. It is often used over large areas as an indication of land cover change.

The NDVI ratio is calculated using (NIR - Red) / (NIR + Red). For example, a pixel with an NDVI of less than 0.2 is not likely to be dominated by vegetation, and an NDVI of 0.6 and above is likely to be dense vegetation.

For Sentinel-2 images, the NIR band is band 8 and the red band is band 4. Below, we define a general function for computing any two-band index and apply it to calculate NDVI from a multi-layer raster (e.g. a brick or SpatRaster).

# Assume 's_tay' is a multi-layer SpatRaster loaded from 'taycrop.tif'
# If using terra, you can combine bands with c() rather than stack()

# Define a vegetation index function:
VI <- function(img, k, i) {
  # Extract the specified bands
  bk <- img[[k]]
  bi <- img[[i]]
  # Calculate the index
  vi <- (bk - bi) / (bk + bi)
  return(vi)
}

# For Sentinel-2: NIR = band 8, Red = band 4

ndvi <- VI(tay, 8, 4)

# Plot NDVI using a reversed terrain color palette
png('05ndviplot.png', width = 4, height = 4, units = "in", res = 300)
plot(ndvi, col = rev(terrain.colors(10)), main = 'Sentinel 2, Loch Tay - NDVI')
dev.off()
## png 
##   2

To find out the distribution of the pixel NDVI values, we can plot a histogram.

png('06ndvihist.png', width = 4, height = 4, units = "in", res = 300)
terra::hist(ndvi,
     main = "Distribution of NDVI values",
     xlab = "NDVI",
     ylab = "Frequency",
     col = "aquamarine3",
     xlim = c(-0.5, 1),
     breaks = 30,
     xaxt = 'n')
axis(side = 1, at = seq(-0.5, 1, 0.05), labels = seq(-0.5, 1, 0.05))
dev.off()
## png 
##   2

So what does this mean?

The histogram is strongly skewed to the right, towards highh NDVI values, indicating a highly vegetated area.

Now that we know that this area has lots of vegetation, we can also mask the pixels with an NDVI value of less than 0.4 (less likely to be vegetation) to highlight where the vegetated areas occur.

png('07ndvimask.png', width = 4, height = 4, units = "in", res = 300)
veg <- classify(ndvi, cbind(-Inf, 0.4, NA))
plot(veg, main = 'Veg cover')
dev.off()
## png 
##   2

How can we save the raster itself, not just plots?

We might want to export the NDVI raster we just created to use in QGIS or other software, or to save it for further use in R.

To save a raster object, use the writeraster function. Saving the data as integers rather than floats requires less memory and processing for the computer to handle. A float is a term used to describe a variable with a fractional value or decimals, e.g. 0.002.

ndvi_scaled <- round(ndvi * 100)
terra::writeRaster(
  x = ndvi_scaled,
  filename = './08tay_ndvi_scaled_2018.tif',
  filetype = 'GTiff',
  datatype = 'INT2S',
  overwrite = TRUE
)

Raster operations also allow us to perform an unsupervised classification, or a clustering of the pixels, in the satellite image. In this context, unsupervised means that we are not using training data for the clustering.

This type of classification can be useful when not a lot is known about an area. In the example below, we are going to use the kmeans algorithm. The algorithm groups pixels that have similar spectral properties in the same cluster. We are going to create 10 clusters using the NDVI raster we have just created above, but first, we need to convert the raster into an array, which is the object format required for the classification.

# convert the raster to vector/matrix (`terra:values` converts the RasterLAyer to array) )
nr <- terra::values(ndvi)

# important to set the seed generator because `kmeans` initiates the centres in random locations
# the seed generator just generates random numbers
set.seed(99)

# create 10 clusters, allow 500 iterations, start with 5 random sets using 'Lloyd' method
kmncluster <- kmeans(
  na.omit(nr), 
  centers = 10, 
  iter.max = 500,
  nstart = 5,
  algorithm = "Lloyd"
)
str(kmncluster)
## List of 9
##  $ cluster     : int [1:429936] 7 7 7 7 7 7 7 4 4 7 ...
##  $ centers     : num [1:10, 1] -0.525 0.421 0.846 0.696 0.233 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "1" "2" "3" "4" ...
##   .. ..$ : chr "taycrop_8"
##  $ totss       : num 156503
##  $ withinss    : num [1:10] 67.4 23.6 28.1 28.5 32.4 ...
##  $ tot.withinss: num 484
##  $ betweenss   : num 156020
##  $ size        : int [1:10] 4650 8155 71872 45902 10754 58153 58643 18762 4109 148936
##  $ iter        : int 287
##  $ ifault      : NULL
##  - attr(*, "class")= chr "kmeans"

Kmeans returns an object with 9 elements. The length of the cluster element within kmncluster is 429936 which is the same as the length of nr created from the ndvi object. The cell values of kmncluster$cluster range between 1 to 10 corresponding to the input number of clusters we provided in the kmeans() function. kmncluster$cluster indicates the cluster label for the corresponding pixel.

Our classification is now complete, and to visualise the results, we need to convert the kmncluster$cluster array back to a RasterLayer of the same dimension as the ndvi object.

# First create a copy of the ndvi layer
knr <- ndvi

# Now replace raster cell values with kmncluster$cluster array
knr[] <- kmncluster$cluster

# Alternative way to achieve the same result
terra::values(knr) <- kmncluster$cluster
knr
## class       : SpatRaster 
## dimensions  : 507, 848, 1  (nrow, ncol, nlyr)
## resolution  : 9.217891e-05, 9.217891e-05  (x, y)
## extent      : -4.320218, -4.242051, 56.45366, 56.50039  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source(s)   : memory
## varname     : taycrop 
## name        : taycrop_8 
## min value   :         1 
## max value   :        10

We can see that knr is a RasterLayer (here it reads SpatRaster) with 429,936 cells, but we do not know which cluster (1-10) belongs what land cover or vegetation type. One way of attributing a class to a land cover type is by plotting the cluster side-by-side with a reference layer of land cover and using unique colours for each cluster. As we don’t have one for our example area, we can use the NDVI map we created earlier or the RGB plot.

par(mfrow = c(1, 2))
plot(ndvi, col = terra::rev(terrain.colors(10)), main = 'NDVI')

# Use the viridis palette for clusters
plot(knr, main = "Kmeans", col = viridis::viridis(10))

If we want to plot our classification alongside the RGB rendering of the raster, and save the two plots, we can use the code below:

# Create an RGB composite using bands 4, 3, and 2
tayRGB <- c(tay[[4]], tay[[3]], tay[[2]])

png('09rgb_kmeans.png', width = 10, height = 8, units = "in", res = 300)
par(mar = c(10.8, 5, 10.8, 2), mfrow = c(1, 2))
plotRGB(tayRGB, axes = TRUE, stretch = "lin", main = "RGB")
plot(knr, main = "Kmeans", yaxt = 'n', col = viridis::viridis(10))
dev.off()
## png 
##   2

A simple classification like this one is only to give an idea of land cover types. In the above example, we could deduce that cluster 8, in green, is water as it covers the Loch. We can also spot patterns in the vegetation cover in both the NDVI and kmeans cluster plots. We could deduce that the areas with the highest NDVI ratio are likely to be forest cover.

Exercise: Using the NDVI, RGB and kmeans plot, can you deduce other land cover around the Loch Tay area?

Conclusion In this introduction to remote sensing spatial analysis, we have covered how to:

Import a GeoTIFF file as a raster in R. Extract layers from a multi-layer raster objects and get the raster properties. Explore raster visulaisation of single and mutil-layered object with rasterVis, ggplot and base R. Explore raster manipulations by calculating and plotting the NDVI ratio of the pixels in our image. Perform an unsupervised image classification using the kmeans algorithm to cluster the pixels in 10 clusters. If you want to explore further, there are excellent resources availabe in the Spatial Data Science with R by Robert J. Hijmans. This resource seems to be continuosly updated. For example, Hijmans uses terra already over there.